Data Ingestion:
- Import data from a MySQL database into HDFS using Sqoop
- Export data to a MySQL database from HDFS using Apache Sqoop
- Change the delimiter and file format of data during import using Sqoop
- Ingest real-time and near-real-time (NRT) streaming data into HDFS using Flume
- Load data into and out of HDFS using the Hadoop File System (FS) commands
Transform Stage Store:
- Load data from HDFS and store results back to HDFS using Spark
- Join disparate datasets together using Spark
- Calculate aggregate statistics using Spark. Example: average or sum
- Filter data into a smaller dataset using Spark
- Write a query that produces ranked or sorted data using Spark
Data Analysis:
- Read and create a table in the Hive meta-store in a given schema
- Extract an Avro schema from a set of data files using Avro-tools
- Create a table in the Hive meta-store using the Avro file format and an external schema file
- Improve query performance by creating partitioned tables in the Hive meta-store
- Evolve an Avro schema by changing JSON files
No comments:
Post a Comment